Wang et al. BMC Genomics 2012, 13:553 
http://www.biomedcentral.com/1471-2164/13/553 



^BMC 



Genomics 



RESEARCH ARTICLE Open Access 



Sequencing of the core MHC region of black 
grouse {Tetrao tetrix) and comparative genomics 
of the galliform MHC 

Biao Wang 1 , Robert Ekblom 2 , Tanja M Strand 1,3 , Silvia Portela-Bens 1 and Jacob Hbglund 1 



Abstract 

Background: The MHC, which is regarded as the most polymorphic region in the genomes of jawed vertebrates, 
plays a central role in the immune system by encoding various proteins involved in the immune response. The 
chicken MHC-B genomic region has a highly streamlined gene content compared to mammalian MHCs. Its core 
region includes genes encoding Class I and Class IIB molecules but is only ~92Kb in length. Sequences of other 
galliform MHCs show varying degrees of similarity as that of chicken. The black grouse {Tetrao tetrix) is a wild 
galliform bird species which is an important model in conservation genetics and ecology. We sequenced the black 
grouse core MHC-B region and combined this with available data from related species (chicken, turkey, gold 
pheasant and quail) to perform a comparative genomics study of the galliform MHC. This kind of analysis has 
previously been severely hampered by the lack of genomic information on avian MHC regions, and the galliformes 
is still the only bird lineage where such a comparison is possible. 

Results: In this study, we present the complete genomic sequence of the MHC-B locus of black grouse, which is 
88,390 bp long and contains 19 genes. It shows the same simplicity as, and almost perfect synteny with, the 
corresponding genomic region of chicken. We also use 454-transcriptome sequencing to verify expression in 17 of 
the black grouse MHC-B genes. Multiple sequence inversions of the TAPBP gene and TAP1-TAP2 gene block 
identify the recombination breakpoints near the BF and BLB genes. Some of the genes in the galliform MHC-B 
region also seem to have been affected by selective forces, as inferred from deviating phylogenetic signals and 
elevated rates of non-synonymous nucleotide substitutions. 

Conclusions: We conclude that there is large synteny between the MHC-B region of the black grouse and that of 
other galliform birds, but that some duplications and rearrangements have occurred within this lineage. The MHC-B 
sequence reported here will provide a valuable resource for future studies on the evolution of the avian MHC 
genes and on links between immunogenetics and ecology of black grouse. 



Background 

The Major Histocompatibility Complex (MHC) plays a 
central role in the immune system of all jawed vertebrates. 
It is the most polymorphic genomic region identified, and 
encodes proteins involved in the innate and adaptive im- 
mune responses [1,2]. Particularly, the MHC Class I and 
Class II genes encode proteins that bind to and carry small 
antigen peptides to the cell surface thus presenting them 
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to cytotoxic T cells or helper T cells. This in turn triggers 
the downstream immune cascade. Therefore, this genomic 
region is crucial for the organisms resistance and suscepti- 
bility to pathogenic disease [2]. 

Despite its functional consistency, the MHC genomic 
cluster has different gene organization patterns across 
different organisms. The latest genomic map of the 
human MHC (HLA) spans about 7.6 Mb and contains 
421 gene loci on a contiguous region on chromosome 6 
[3], whereas the MHC regions of other organisms gener- 
ally have a different gene order and size, or are even 
scattered on separate chromosomes [4-6]. Notably, the 
chicken (Gallus gallus) has two genetically independent 
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MHC clusters, the MHC-B and MHC-Y (previously Rfp- 
Y). Both are located on microchromosome 16 (GGA16) 
[7-11]. There has been some evidence for the gene ex- 
pression and function for disease susceptibility of the 
MHC-Y region, but it is the MHC-B that is believed to 
be the main functional MHC genomic region of chicken 
[12-15]. The highly streamlined MHC-B, which includes 
genes encoding Class I and Class IIB molecules, contains 
only 19 genes and is about 92Kb in length [14-16]. Se- 
quencing efforts have also been made on other bird spe- 
cies, such as mallard duck, red-winged blackbird, house 
finch and zebra finch [17-21]. However, none of these 
species seem to share the characteristics of the minimal 
essential chicken MHC. 

The chicken and other fowl species belong to the order 
Galliformes. Available MHC maps of other galliform birds 



generally show the same compact feature of this genomic 
region as that of chicken. For example, the MHC-B of the 
turkey (Meleagris gallopavo) has a good synteny with the 
chicken MHC-B, the only exceptions being that turkey 
MHC-B has more BG and BLB (MHC Class IIB) gene 
copies and an inversion of the TAPBP gene [22]. The quail 
(Coturnix japonica) MHC-B includes an expanded num- 
ber of duplicated genes and the numbers of the duplicated 
loci also vary to some extent among individuals [23,24]. 
The MHC-B of the golden pheasant (Chrysolophus pictus) 
also shows a good synteny with chicken, but has two 
inversions of TAPBP and TAP1-TAP2 [25]. 

Black grouse (Tetrao tetrix) is a wild galliform bird 
species that has been well-studied from an ecological 
perspective, including conservation genetics, behavioural 
ecology, sexual selection and the evolution of the lek 
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Figure 1 Sequence features of the black grouse MHC-B region. A. Position of the sequenced fosmid clones. Dotted lines indicate the 
heterozygous parts. B. Gene annotation of the MHC-B of black grouse. Different shadows indicate different MHC gene families defined from 
human MHC From dark to light: Class I, Class II, Class III, others. C. Average 454 sequencing coverage per nucleotide for each expressed region. 
D. Positions of repetitive elements and tRNAs. E. CpG islands in 100 bp window size. F. GC contents in 200 bp window size. 
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mating system [26-28]. Previous work on the black 
grouse MHC identified the MHC-B and MHC-Y gen- 
omic loci, and the polymorphism of the second exon of 
the MHC Class IIB gene has been surveyed at the popu- 
lation level [29-31]. In this paper, we investigate the 
detailed genomic organization of the black grouse 
MHC-B region. We constructed a fosmid library to se- 
quence the MHC-B genomic cluster and used Roche 
454-transcriptome sequencing (RNA-Seq) to verify the 
expression of the identified genes [32]. The results allow 
us to conduct a comprehensive comparative genomics 
analysis of the galliform MHC region. Due to a previous 
lack of genomic data on avian MHC regions this kind of 
analysis has not previously been feasible. The black 
grouse MHC sequence, together with four other com- 
pletely characterized galliform MHC regions, thus offer 
a unique opportunity in bird MHC studies. 

Results 

Sequence of the black grouse MHC-B region 

Four overlapping MHC-bearing fosmid clones with 
lengths of 29,972 bp - 40,168 bp were identified and 
sequenced (Figure 1A). They were aligned into a consen- 
sus sequence of 88,390 bp (GenBank accession number 
JQ028669). This sequence covers the majority of the 
black grouse MHC-B region (including the complete 
"core" MHC region), from the BTN1 gene to the CYP21 
gene. Since the sequenced black grouse we used was a 
wild and not inbred animal, we found clones from both 
homologous chromosomes. More specifically, P2D1 was 
found to be from a different chromosome than the other 
three clones (Figure 1A). To maximize the possibility of 
obtaining a real complete haplotype of the black grouse 
MHC, we used the combined sequences of P3B2 and 
P5B8 for the consensus sequence for the heterozygous 
parts. Therefore, our black grouse MHC sequence was 
for the most part a real haplotype, apart from the small 
gap (1,872 bp) between P3B2 and P5B8 which was only 
covered by P2D1. Sequencing both homologous chro- 
mosomes provided us the opportunity to identify poly- 
morphisms in the heterozygous parts. From the 
heterozygous overlap (25,345 bp) of P3B2 and P2D1, we 
found 275 single nucleotide polymorphisms (SNPs) and 
31 deletion-insertion polymorphisms (DIPs). From the 
much smaller overlap (2,693 bp) of the P2D1 and P5B8, 
we found 3 SNPs and 2 DIPs (Additional file 1). 

Five chicken repeats (CR) were identified, of which 
CR1-F and CR1-X1 were also found to match the 
chicken MHC-B. We also found 14 simple sequence 
repeats (SSRs, microsatellites) in the black grouse MHC-B 
region (Figure ID, Additional file 2). The average GC con- 
tent of the black grouse MHC-B region is 59.0%, 
which is as high as that of the chicken (55.5%) (Figure 1 F). 
This is probably because the region we sequenced lay 



on the gene intensive BF/BLB region, which had a 
higher GC content than the other regions. Also, the 
black grouse MHC has a high density of CpG islands 
(Figure IE), which may indicate the functional im- 
portance of this region [33]. 

Gene identification and verification 

All the three gene prediction programs used could identify 
most of the genes located on black grouse MHC-B, and 
most of the chicken, turkey and golden pheasant MHC 
genes could be well aligned with their homologous genes 
on black grouse MHC-B. Therefore, 18 genes including 
BTN1 (partial), BTN2, Blec2, Blecl, BLB1, TAPBP, BLB2, 
BRD2, DMA, DMB1, DMB2, BF1, TAP2, TAP1, BF2, C4, 
CenpA, CYP21 (partial) were confirmed at least by three 
of the above approaches (Table 1). The only exception was 
the gene BG1: Fgenesh and Genscan did not identify this 
gene and the comparison with chicken and turkey gave in- 
consistent results. Therefore, the annotation of this gene is 
only based on the result from the GeneMark prediction 
and was checked manually. 

From our RNA-Seq data, 480 reads could be mapped 
onto 17 predicted genes in the black grouse MHC-B re- 
gion, with an average mapped contig length of 209.4 bp. 
That is, 17 out of the 19 predicted genes (all except 
BTN2 and CenpA) had concrete evidence of gene ex- 
pression (Figure 1C). The gene expression levels of the 
verified genes were variable. For example, BTN1, DMB2 
and TAPBP were highly expressed, with mean sequence 
coverage per nucleotide of 34.6, 23.0 and 21.5, respect- 
ively (Table 1). The MHC Class I and Class IIB also had 
high levels of gene expression. The sequencing coverage 
per nucleotide of BF2, BLB1 and BLB2 were 16.1, 18.1 
and 12.2 respectively. In contrast, the genes BG1, Blecl, 
DMB1, TAP1 and CYP21 only had one single transcript 
read mapped each. Within genes, there was a strong 3- 
prime (including the un-translated region) bias of the 
number of the transcripts mapped; this is likely due to 
the technical nature of the cDNA library preparation 
[34]. The absence of the verification of some exons may 
also be an artefact of the library preparation, limited se- 
quencing depth or data analysis strategy, and does not 
necessarily mean that the exons are not expressed [32]. 

Comparative genomics of the galliform MHC-B 

The black grouse MHC-B genomic region shares an al- 
most perfect synteny with that of chicken, the gene 
numbers and gene orders of the two species are identical 
(Figure 2). Compared to the turkey MHC-B, black 
grouse MHC-B has less BG genes and less BLB genes, 
but the MHCs of the two species are still highly similar. 
The golden pheasant MHC-B also has more BLB genes 
than that of black grouse (Figure 3). The quail MHC-B 
has significant expansions of BLB genes and BF genes, 



Table 1 Features of the coding sequences of black grouse MHC-B genes and sequence comparisons with homologous genes in chicken, turkey, quail and 
pheasant 



Gene 



Position Strand Gene Exons 
length 
(bp) 



Exons Average 
verified* coverage per 
nucleotide 



Comparison with other galliform MHCs 



Chicken 
Nucleotide Amino 



Turkey 
Nucleotide Amino 



Quail Pheasant 
Nucleotide Amino Acid Nucleotide Amino 
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BTN1 (partial) 


1 1 2-4076 


+ 


786 


14 


1 


34.6 




















BTN2 


7150-10374 


+ 


921 


8 


0 


0 


0.797 


0.738 


0.913 


0.894 


/ 


/ 


/ 


/ 


0.470 


BG1 


12132-16240 


- 


804 


14 


1 


1.0 


0.881 


0.826 


0.851 


0.800 


0.832 


0.771 


/ 


/ 


0.335 


Blec2 


18188-20712 


- 


825 


6 


1 


5.8 


0.698 


0.595 


0.706 


0.593 


0.705 


0.607 


/ 


/ 


0.433 


Bled 


23007-24547 


+ 


471 


5 


1 


1.0 


0.934 


0.917 


0.949 


0.942 


0.927 


0.894 


0.958 


0.947 


0.230 


BLB1" 


25386-26738 


- 


792 


6 


3 


18.1 


0.920 


0.863 


0.932 


0.886 


0.884 


0.826 


0.934 


0.890 


0.805 


TAPBP 


27591-31078 


+ 


1296 


8 


/ 


21.5 


0.920 


0.882 


0.951 


0.935 


0.891 


0.856 


0.942 


0.919 


0.301 


BLB2" 


32031-33413 


+ 


792 


6 


5 


10.6 


0.933 


0.897 


0.928 


0.875 


0.890 


0.818 


0.943 


0.902 


0.924 


BRD2 


34798-40437 




2340 


12 


12 


8.7 


0.957 


0.996 


0.976 


0.996 


0.934 


0.995 


0.971 


0.997 


0.011 


DMA 


44248-46392 


+ 


789 


4 


3 


12.2 


0.913 


0.889 


0.937 


0.924 


0.856 


0.817 


0.943 


0.928 


0.375 


DMB1 


46628-48880 


+ 


1020 


8 


2 


1.0 


0.864 


0.794 


0.851 


0.793 


0.847 


0.774 


0.923 


0.876 


0.496 


DMB2 


49322-52121 


+ 


777 


6 


5 


23.0 


0.921 


0.911 


0.951 


0.938 


0.859 


0.822 


0.952 


0.955 


0.235 


BF1 *** 


52919-54907 


+ 


1056 


8 


5 


1.3 


0.858 


0.769 


0.879 


0.789 


0.835 


0.734 


0857 


0.770 


0.733 


TAP2 


56357-59553 




2106 


9 


6 


2.2 


0.927 


0.934 


0.952 


0.950 


0.897 


0.893 


0.940 


0.932 


0.186 


TAP1 


60134-64265 


+ 


1755 


11 


2 


1.0 


0.932 


0.938 


0.957 


0.966 


0.905 


0.915 


0.954 


0.949 


0.195 


BF2*** 


65364-67372 




1077 


8 


6 


16.1 


0.860 


0.749 


0.888 


0.802 


0.826 


0.737 


0.870 


0.767 


0.767 


C4 


68440-82244 


+ 


4875 


38 


4 


1.4 


0.911 


0.940 


0.933 


0.958 


/ 


/ 


0.963 


0.963 


0.207 


CenpA 


82653-84175 


+ 


429 


4 


0 


0 


0.967 


0.993 


/ 


/ 


/ 


/ 


0.968 


0.979 


0.037 


CYP21 (partial) 


84680-88358 


+ 


1326 


10 


2 


1.0 





















* Expression verified by 454 transcriptome sequencing, **BLB = MHC Class MB genes, *** BF = MHC Class I genes. 
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Figure 2 Identity matrix plotting of the nucleotide sequences of MHC-B region of black grouse itself (left) and between black grouse 
and chicken (right). Different shading of genes indicate different MHC gene families defined from human MHC. From dark to light: Class I, Class 
II, Class III, others. 



and has some pseudogenes scattered in this region, but 
the black grouse MHC-B is still in an obvious synteny 
with it. 

The most remarkable features of the galliform MHC-B 
is the gene orientation of TAPBP, TAPl and TAP2. The 



black grouse MHC-B has inversed TAPBP and TAP1- 
TAP2 blocks compared to the chicken, while only the 
TAP1-TAP2 block is inversed compared to the turkey. 
The golden pheasant shares the same gene orientation 
of TAPBP and TAP1-TAP2 block with black grouse, 
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Figure 3 Phylogenetic relationship and structural comparison of the MHC-B regions of black grouse, chicken, turkey, quail and golden 
pheasant. The phylogenetic tree is constructed with the Neighbor-joining method. Numbers next to the branch points indicate the bootstrap 
values as percentages of 1000 replicates. Pseudogenes of the quail MHC-B are not shown. Arrows and dotted lines highlight inversions and 
duplications. Numbers beside the arrows indicate the positions of the breakpoints on the compared sequences. Accession numbers: black grouse 
(JQ028669), chicken (AB268588), turkey (DQ993255), quail (AB078884), golden pheasant (JQ440366). Different shading of genes indicate different 
MHC gene families defined from human MHC. From dark to light: Class I, Class II, Class III, others. 
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where the gene orientation of these gene/gene blocks for 
quail is the same as that of chicken (Figure 3). 

Looking at the genes separately, we found that most of 
them were very similar in terms of nucleotide and amino 
acid sequence between the five galliform species 
(Table 1). However, the phylogenetic relationships of 
these genes are not consistent. The phylogenetic tree 
constructed using the entire MHC-B sequences of the 
five species (Figure 3) follows the neutral expectation 
[35]. The phylogeny of the coding sequences of TAPBP, 
BRD2, DMA, DMB1, BF1 and TAP2 share the same tree 
topology with the tree constructed using the entire 
MHC-B, whereas the phylogenetic trees for the coding 
sequences of Blecl, BLB1, BLB2, DMB2, TAP1 and BF2 
show different tree topologies within the clade of black 
grouse, turkey and golden pheasant (Figure 4). Interest- 
ingly, genes with aberrant phylogenetic relationships 
(with grouse or turkey basal to the other two species) 
showed signs of having elevated d N /d s ratios compared 
to genes following the phylogenetically neutral expect- 
ation (Figure 5). This could be interpreted as an 



indication of increased balancing selection or relaxed 
purifying selection acting on these genes. 

Discussion 

We have sequenced, annotated and analysed the MHC-B 
gene cluster of the black grouse. Black grouse is a wild 
bird species and represents the lineage Tetraoninae in the 
Galliformes [36]. With the availability of its MHC se- 
quence and several other fully sequenced galliform 
MHC we now, for the first time, have the opportunity 
to perform a comparative genomic study of avian 
MHC. The MHC-B gene cluster of black grouse is 
just as simple and streamlined as that of chicken [15] 
(Figure 3). By contrast, the quail MHC-B has more 
duplicated genes and pseudogenes (10 BLB, 7 BF and 
8 BG loci) compared to black grouse [23] (Figure 3). The 
turkey MHC-B and the golden pheasant MHC-B, which 
are phylogenetically closer to black grouse than chicken 
and quail, also have expanded BLB genes [22,25] (Figure 3). 
Our results provide additional evidence that the extremely 
compact nature of the chicken MHC is not merely an 
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Figure 4 Phylogenetic relationships of the coding sequences of the homologous genes in black grouse, chicken, turkey, quail and 
golden pheasant. The phylogenetic trees are constructed with the Neighbor-joining method. Numbers next to the branch points indicate the 
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artefact of domestication, since we find a similar pattern in 
a wild related species that is fully outbred. 

The nucleotide identity of the black grouse MHC-B 
shows high similarity with that of other galliform birds 
(Table 1). However, individual MHC genes might have dif- 
ferent evolutionary histories. The phylogenetic tree based 
on the entire MHC-B sequence shows exactly the same 
topology as neutral markers [35] (Figure 3). But when we 
used the coding sequences of each gene independently, 
only TAPBP, BRD2, DMA, DMB1, BF1 and TAP2 share 
the same tree topology with neutral genes (Figure 4). 
Interestingly, for the genes Blecl, DMB2, TAP1 and BF2, 
the black grouse is more divergent than turkey and pheas- 
ant, while for the two BLB genes (BLB1 and BLB2), black 
grouse is closer to pheasant than turkey (Figure 4, 
Additional file 3). If we use the d N /d s values to estimate 
the selection pressure on the genes, we find that the genes 
following the neutral phylogenetic expectation generally 
have lower d N /d s values than genes with aberrant tree top- 
ologies (Figure 5). Taken together the deviation from neu- 
tral phylogenetic patterns and elevated d^/ds levels 
indicates that the molecular evolution of several of the 
genes in the galliform MHC region is affected by selective 
forces. Especially, the MHC class IIB genes (BLB1 and 
BLB2) show elevated levels of d N /d s . The peptide binding 
regions of these genes are classical examples of balancing 
selection [37] . An intriguing possibility is that the cluster- 
ing of the grouse BLB and pheasant BLB might be due to 
specific selection in the wild since they were both sampled 
from natural populations, but this hypothesis needs fur- 
ther confirmation. 

Another striking finding of the comparison of galli- 
form MHC-B is the repeated inversions of the TAPBP 
gene and the TAP1-TAP2 block (Figure 3). Using data 



from all available galliform MHC sequences, we found 
that the inversion of the TAPBP gene, located between 
the two MHC class IIB loci, seems to have happened 
once in the clade; either in the lineage leading to chicken 
and quail or in the lineage of pheasant, turkey and 
grouse, depending on the ancestral state. By contrast, 
the inversion of the TAP1-TAP2 gene block has oc- 
curred at least twice (depending on what the ancestral 
state is, which we cannot tell from our data) during the 
evolution of this clade. The TAP1-TAP2 block is flanked 
by the two Class I genes, BF1 and BF2. The events of 
gene conversion or interlocus recombination in the evo- 
lution of MHC genes have been reported before 
(reviewed in [38]). Here, our result could provide an in- 
direct evidence for such events since if the gene conver- 
sion occurred repeatedly, the non-random breakpoints 
beside the two BF loci may lead to the inversion of the 
gene block TAP1-TAP2 between them. However, this 
needs to be further tested. 

In this study, we constructed a fosmid library and used 
it to screen of the MHC genes. Fosmid libraries have been 
widely used in large genome projects such as gap closure 
of the human genome or metagenomics analysis [39-41]. 
The success of our experiment demonstrates that the fos- 
mid library is also suitable and convenient to sequence 
specific genome regions of a species whose genome map 
is unavailable. To verify the expression of the identified 
MHC genes, we mapped the transcriptome data of a 454 
sequencing project to the MHC region. This allows us to 
efficiently confirm the expression of 17 identified genes. 
However, due to the limited 454 sequencing depth, it was 
not possible to cover all the 19 putatively expressed genes. 
Moreover, not all exons were verified in the expressed 
genes. This could be because of limited sequencing 



Wang et al. BMC Genomics 2012, 13:553 
http://www.biomedcentral.com/1471 -21 64/1 3/553 



Page 8 of 10 



coverage, alternative splicing or artefacts from the map- 
ping method to the short exons [42-44] . 

Conclusions 

We conclude that there is large synteny between the 
MHC-B region of the black grouse and that of other gal- 
liform birds. Some large scale changes like gene duplica- 
tions and genomic rearrangements have, however, 
occurred within the galliform lineage. Some of the genes 
in the region also seem to have been affected by selective 
forces within this clade, as inferred from deviating 
phylogenetic signals and elevated rates of non- 
synonymous substitutions. The MHC-B sequence of the 
black grouse reported here will provide a very valuable 
resource for future studies on the evolution of the avian 
MHC genes and on immunogenetics and ecology in 
black grouse. 

Methods 

Genomic sequencing 

The genomic DNA used for the sequencing of the MHC 
cluster in black grouse was extracted from a male bird 
shot near Ostersund, Sweden in November 2009. Muscle 
tissue was immediately stored in 70% ethanol, -20°C 
until use. DNA extraction followed the high molecular 
weight (HMW) protocol described by Blin et al. [45]. 
The fosmid library was constructed using the Copy Con- 
trol Fosmid Library Production Kit according to the 
manufacturer's protocol (Epicentre biotechnology, WL 
USA). DNA was first separated by pulsed field gel elec- 
trophoresis (PFGE) and 30-39 kb fragments were 
excised, purified, blunt-ended and ligated into the 
pCClFOS fosmid vectors included in the kit. Ligated 
DNA mixture was then packaged using the supplied 
lambda packaging extracts and transformed into 
EPI300-T1 phage E. coli hosts. In total the fosmid library 
consists of approximately 150,000 clones spread over 
clone pools in twenty 96-well plates. 

Screening of the library was performed by a modified 
PCR-based clone pool method [46]. Nine pairs of PCR 
primers were used to screen and pinpoint the MHC- 
bearing clones (Additional file 4). One of the primer 
pairs was developed in a previous study of black grouse 
MHC BLB exon 2 [29], while the others were developed 
from highly conserved gene regions between Chicken 
and Turkey. Four overlapping fosmid clones covering 
the core MHC Class I and Class IIB genes were selected 
to be sequenced. Shotgun subcloning and Sanger- 
sequencing of the fosmid clones were performed at 8X 
coverage by Macrogen (Macrogen Inc., Seoul, Korea). A 
primer-walking method was used to fill the shotgun se- 
quencing gaps. 

The sequencing reads were vector-trimmed, quality- 
checked and assembled using CAP3 [47]. The assembled 



fosmid clones were aligned into one consensus sequence 
using the ClustalW program implemented in Codon- 
Code Aligner 2.06 (CodonCode Corporation, MA, USA) 
[48]. For the heterozygous parts of overlapping clones, 
we used the sequences from P3B2 and P5B8 as the con- 
sensus sequence (Figure 1A). We also followed a 
genomic-alignment strategy to detect the putative single 
nucleotide polymorphisms (SNPs) in the heterozygous 
parts [49,50]. Alignment of the genomic sequences of 
the fosmid clones and manual identification of SNPs 
were conducted using the ClustalW program in Codon- 
Code Aligner 2.06. 

Gene identification 

Identification of coding regions and putative exons was 
conducted by three different gene prediction programs: 
Fgenesh (www.softberry.com), GeneMark.hmm (http:// 
exon.gatech.edu) and Genscan (http://genes.mit.edu/ 
GENSCAN.html) [51-53]. In the Fgenesh and Gene- 
Mark.hmm algorithms, the organism-specific parameters 
were all set as in the chicken; in Genscan, the para- 
meters were set as vertebrate. In addition to the auto- 
matic gene identification, we also extracted individual 
gene sequences from the chicken MHC (GenBank acces- 
sion number: AB268588 and AL023516), turkey MHC 
(GenBank accession number: DQ993255) and golden 
pheasant MHC (GenBank accession number: JQ440366), 
and used the ClustalW program in CodonCode Aligner 
to align them with the black grouse sequence to identify 
the gene positions. Finally, we manually curated the 
genes by comparing the results from all above 
approaches, as well as the RNA-Seq mapping result 
described below. Repeat elements were identified using 
Repeatmasker (www.repeatmasker.org), and tRNAs were 
identified using tRNAScan [54]. The identification of 
CpG islands and the plotting of GC contents were per- 
formed using the EMBOSS software suite [55]. 

Transcriptome sequencing and gene verification 

RNA-Seq data from a 454-transcriptome sequencing pro- 
ject was used to verify expression of the MHC genes (Gen- 
Bank short read archive number SRA036234) [56]. This 
data was generated from a male individual collected near 
Uppsala, Sweden in 2008. Spleen tissue, where many 
immune-related genes are likely to be expressed, was used 
to construct the cDNA library. The 454-sequencing was 
conducted in two partial runs of the GS FLX sequencing 
instrument (Roche) with Titanium XL reagents and 70x75 
mm PicoTiterPlates (PTP). In total 182,179 quality-filtered 
sequencing reads with average length of 321 ± 141 bp were 
used for mapping. We used the program gsMapper in 
Newbler 2.5.3 (Roche/454 Life Sciences) to map the 454- 
reads to the assembled black grouse MHC consensus se- 
quence. To make sure the mapped reads did not originate 
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from MHC-like paralogues in other genomic regions, we 
blasted the mapped reads to the entire chicken genome. 
Reads with a best hit outside the MHC region were 
excluded in further analysis. 

Comparative genomics analysis 

The identity dot matrixes of the black grouse MHC-B 
sequence and the chicken MHC-B sequence (GenBank 
accession number: AB268588) were generated using Pip- 
Maker [57]. The alignment of the entire MHC-B regions 
of the five galliform species was performed using the 
ClustalW program in CodonCode Aligner and the pro- 
gram Mauve 2.3.1 [58] and checked manually. The Gen- 
Bank accession numbers of the downloaded sequences 
are AB268588 (chicken), DQ993255 (turkey), JQ440366 
(golden pheasant) and AB078884 (quail). The molecular 
evolution model of the sequences was estimated by jMo- 
delTest [59] and the phylogenetic tree was constructed 
using the neighbor-joining method in MEGA 5.05 [60]. 
A bootstrap of 1000 replicates was used to verify the 
creditability of the tree. 

The coding sequences of the individual MHC genes 
were extracted directly from the GenBank entries of the 
above listed sequences by the GenBank online tools. For 
the quail, the BF genes beside TAP1-TAP2 block were 
used as BF1 and BF2 respectively; the BLB genes beside 
TAPBP gene were used as BLB1 and BLB2 respectively. 
The alignments of the coding sequences were also con- 
ducted using ClustalW in CodonCode Aligner. The 
phylogenetic trees were constructed following the same 
protocol as the entire MHC-B tree. The outgroup 
sequences used to construct phylogenetic trees for 
pooled BF and pooled BLB genes (in additional file 3) 
were DQ251182 (domestic goose, Anser anser) and 
DQ490139 (mallard, Anas platyrhynchos) respectively. 
To estimate the molecular selection forces, the rates of 
nonsynonymous to synonymous (d N /ds) were calculated 
using Nei-Gojobori method in the program PAML 4.6 
[61,62]. All the pairwise dN/ds values between the five 
galliform species were summarised to calculate the aver- 
age dN/ds value for the gene. 
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